Introduction

On 2020-01-14 I run the analysis of Gene Ontology terms. Here I want to focus on part of those results and make some findings clearer. I will use results only from gene annotation, and not from annotation of transcripts, which is a somewhat noiser dataset. I will not look at the functions affected by hatching condition, since it only affects a couple of genes, to be inspected later. Among all cellular functions significantly enriched with genes differentially expressed among selective regimes, I will focus on those that are most significant, and detailed enough. For example, I skip terms like transmembrane transport, proteolysis or protein phosphorylation, because they are too general. Sections below are named for the functional categories that I think are worth discussing. I will try to contextualize the results. My entry point to the literature is (Garcia-Roger et al. 2019).

I need the results of the differential expression analysis (2020-01-08) to identify the genes annotated with the significant GO terms, and to assess the direction of gene expression change between selective regimes.

library(variancePartition)
library(GO.db)
library(ggplot2)
library(stringr)
ANNOTATION   <- '../2019-07-26/genes/annotation.txt'
EXPRESSION   <- '../2020-01-08/genes.RData'   # variance partition and the mixed models fitted with dream()
INTERPROSCAN <- '../2019-07-19/functional_annotation.tsv'  # annotation of transcripts
TRANS2GENE   <- 'transcript2gene.txt'   # created in this folder by README.sh from 2019-07-10/transcripts.fa

The annotation table includes only the most specific terms that can be assigned to a gene. The more general, ancestor terms are assumed, of course, but not explicitly mentioned in the annotation. For the purpose of retrieving all genes annotated to a term, I need to expand the annotation, to make ancestor terms explicit. It took me a while to realize that I can do this with the unlist() function:

annotation <- read.table(ANNOTATION, col.names = c('gene', 'GOterms'), colClasses = c('character', 'character'))
head(annotation)
##          gene                                                           GOterms
## 1 XLOC_000002                                  GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009                                  GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010                       GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021                       GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036                                  GO:0005272|GO:0006814|GO:0016020
annotation.list <- strsplit(annotation$GOterms, split='|', fixed=TRUE)
names(annotation.list) <- annotation$gene
# append can only join two vectors at a time. It works with lists!
allAncestors <- append(as.list(GOBPANCESTOR),
                       append(as.list(GOMFANCESTOR),
                              as.list(GOCCANCESTOR)))

fullAnnotation <- lapply(annotation.list,
                         FUN = function(x){
                           unique(append(x,
                                         unlist(allAncestors[x], use.names=FALSE)))
                         })
load(EXPRESSION)

Without any attempt to characterize gene function any further, I will make the assumption that the more genes in a pathway are expressed, the more active the pathway is. It is, however, conceivable that some genes are annotated as involved in a function that they regulate negatively.

Dormant or resting eggs are embryos in a dormant state. RNA was extracted either after forced diapause or not, but under hatching conditions. That is, a variable portion of eggs are expected to be leaving the dormancy stage and resuming metabolism, development and hatching. The portion of eggs getting ready to hatch is known to be larger (actually, a majority) in populations from regular environments. Thus, some of our results could resemble those from comparisons between resting and non-resting (amictic) eggs (Ziv et al. 2017, @Rozema2019).

We know that gene expression levels do not correlate with protein abundances (Ziv et al. 2017). In dormant embryos, where translation is downregulated (Ziv et al. 2017), this lack of correlation may be even more important. Many genes found to be expressed in dormant embryos may only be translated upon exit from dormancy. From this point of view, differences in gene expression between regimes may not reflect a difference in the active functions at the moment of RNA extraction, but differences in the original provision.

# By using 'paste(goterm,collapse='|')', I can retrieve all genes annotated with any of a set of GO terms.
getGenes <- function(goterm, annotation = fullAnnotation, varpart = varPart) {
  genes <- names(annotation[grep(paste(goterm, collapse='|'), annotation)])
  if (! is.null(varpart)) {
    genes <- genes[genes %in% row.names(varpart)]
  }
  return(genes)
}

volcanoPlot <- function(fit=fitmm, Coef='regime', num=length(fitmm$F), term=goterm, genes=getGenes(term)){
  DE <- topTable(fit, coef=Coef, number=num)
  DE$annotation <- factor('other', levels=c('other', Term(GOTERM[term])), ordered=TRUE)
  DE[row.names(DE) %in% genes, 'annotation'] <- Term(GOTERM[term])
  DE <- DE[order(DE$annotation),]
  ggplot(data=DE, mapping=aes(x=logFC, y=-log10(P.Value), color=str_wrap(annotation, width=30))) + geom_point() +
    labs(color='Annotation')
}

For adequate interpretation of some categories, I will need to check the original functional annotation performed with interproscan on 2019-07-19. From one of the output tables, I select the most informative columns, potentially including names of functional domains detected in transcripts. I also need the correspondance between genes and transcripts, which is found in 2019-07-10/transcripts.fa.

# The file is tab-separated. Some rows miss the last two fields. Some e-values are '-'. And
# functional description strings may include the single quote character.
interpro <- read.table(INTERPROSCAN, sep='\t', fill=TRUE, na.strings=c('-',''), quote='"',
                       colClasses=c('character','NULL','integer','factor','character','character',
                                    'integer','integer','numeric','NULL','NULL','character','character','character'),
                       col.names=c('transcript', NA, 'tsLength','analysis','acc','description', 'start', 'stop',
                                   'evalue', NA, NA,'IPacc','IPdescription','GO'))
interpro$transcript <- substr(interpro$transcript, 1, 14)   # to remove the .p1, .p2... suffix.

ts2gene <- read.table(TRANS2GENE, colClasses=c('character','character'), col.names=c('transcript','gene'))
# Having all the information, I want a function that outputs the interpro annotations of a gene, a
# set of genes, or a GO term.
showFunctions <- function(gene, goterm=NULL, transcriptTable=ts2gene, annotation=interpro){
  if (! is.null(gene)) {
    return(annotation[annotation$transcript %in% transcriptTable[transcriptTable$gene %in% gene, 'transcript'],])
  } else {
    return(showFunctions(getGenes(goterm)))
  }
}

Biological processes

Nucleotide-excision repair

It is among the most significantly enriched terms, with only 13 genes annotated. In all volcano plots below, fold changes are computed with the expression level in the unpredictable regime in the numerator: if the log fold-change is negative, the expression is higher in the regular environment, and viceversa.

goterm <- 'GO:0006289'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

The expression of NER genes is higher in dormant embryos from populations evolved in the unpredictable environment. I do not find any similar result in the literature, so that the significance of this result is open to interpretation. However, it is one of the most significant and clear results, in addition to potentially being a new discovery.

G protein-coupled receptor signaling pathway, and signal transduction

goterm <- 'GO:0007186'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Among the 206 genes in this category, 55 have an adjusted p-value lower or equal to 0.1. Most of the genes in this pathway significantly regulated by selective regime, including the most significant ones, are underexpressed in the unpredictable regime (negative log fold change). Thus, in general we expect dormant eggs from populations under the unpredictable regime to be less sensitive to this particular pathway of signal transduction.

goterm <- 'GO:0007165'
genes <- getGenes(goterm)
volcanoPlot()

When we look at all 544 genes involved in signal transduction, we still see that most of them are underexpressed in the unpredictable regime, if differentially expressed at all. But the pattern is less clear, and some of these genes are also significantly overexpressed in the unpredictable regime. It is tempting to infer that dormant embryos from unpredictable environments are less sensitive to the unreliable external estimuli.

Cell-matrix adhesion

goterm <- 'GO:0007160'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Among the top 16 genes involved in cell-matrix adhesion, 14 have a reduced expression level in the unpredictable environment, and 2 show an increased expression level.

Cell-matrix adhesion is likely related to embryonic development. The difference in expression of genes in this pathway between regimes makes more sense from the perspective of the expected difference in their current physiological state.

Potassium ion transport, carboxylic acid transport, and ion transmembrane transport

Several functions related to ion transport are significant. I wonder to what extent their significance is not dependent on a common subset of genes.

goterm_K    <- 'GO:0006813'
goterm_COOH <- 'GO:0046942'
goterm_Sulf <- 'GO:0008272'
goterm_Nucl <- 'GO:1901642'
goterm_TM   <- 'GO:0034220'

genes_K    <- getGenes(goterm_K)
genes_COOH <- getGenes(goterm_COOH)
genes_Sulf <- getGenes(goterm_Sulf)
genes_Nucl <- getGenes(goterm_Nucl)
genes_TM   <- getGenes(goterm_TM)

vennDiagram(vennCounts(cbind(Potassium     = row.names(varPart) %in% genes_K,
                             Carboxilic    = row.names(varPart) %in% genes_COOH,
                             Sulfate       = row.names(varPart) %in% genes_Sulf,
                             Nucleoside    = row.names(varPart) %in% genes_Nucl,
                             Transmembrane = row.names(varPart) %in% genes_TM)))

There is hardly any overlap among genes involved in the transport of these substances. The only remarkable overlap happens between potassium ion transport and ion transmembrane transport. But, still the significance of each term should be quite independent of the significance of any other term. This was expected, because the analysis of functional enrichment applied algorithms to account for the dependence structure among GO terms. I analyse them separately, below.

Potassium ion transport

goterm <- goterm_K
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

This biological process is defined as “The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore.”. There are 56 in the analysis with this annotation. The pattern is not very consistent.

Ion transmembrane transport

goterm <- goterm_TM
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Ion transmembrane transport is an even larger and more general term. One gene sticks out with a log fold change close to -4. This gene is XLOC_005433. It has homology to the neurotransmitter-gated ion-channel ligand-binding domain, and to the nicotinic acetylcoline receptor.

Nucleoside transmembrane transport

goterm <- goterm_Nucl
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

The most significant gene in this subset is XLOC_035757, an equilibrative nucleoside transporter slightly more expressed in unpredictable than in predictable environment. Significant only with FDR 0.091.

Carboxylic acid transport

goterm <- goterm_COOH
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

The two most significant genes here, XLOC_017768 and XLOC_017784, are slightly upregulated in the unpredictable environment. The former seems to be an amino acid permease, and the latter, a phospholipase A2. Both include a short predicted intrinsically disordered region.

Trehalose biosynthetic process

Trehalose is a disaccharide that confers protection against dessication in several organisms (n.d.). Trehalose is present in higher concentrations in resting eggs of B. plicatilis (which resist dessication) than in asexual eggs (Rozema et al. 2019). While trehalose is a very relevant molecule for dormant embryos, it is unclear why genes involved in its biosynthesis are expressed more in eggs from regular than from unpredictable environments.

goterm <- 'GO:0005992'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Cell motility, cilium movement and cilium assembly

Adult individuals of B. plicatilis have a crown of cilia. Thus, the expression of genes for cilium assembly, and probably also those for cilium movement and cell motility in the embryo must be related to embryonic development. They are, thus a sign that the embryos are leaving the dormant state and resuming metabolism. Under this light, it makes sense that eggs from a regular environment express these genes more than eggs from the unpredictable environment, because most of the former must be getting ready to hatch. The higher expression level in predictable environments is a clear pattern in genes involved in all three of these processes.

Dr. Roger has actually observed embryo movement in many of the eggs from the predictable regime after 30 hours of exposure to hatching conditions (personal communication).

goterm_cellmoti <- 'GO:0048870'   # cell motility
goterm_movement <- 'GO:0003341'   # cilium movement
goterm_assembly <- 'GO:0060271'   # cilium assembly

genes_cellmoti <- getGenes(goterm_cellmoti)
genes_movement <- getGenes(goterm_movement)
genes_assembly <- getGenes(goterm_assembly)

vennDiagram(vennCounts(cbind('Cell motility'    = row.names(varPart) %in% genes_cellmoti,
                             'Cilium movement' = row.names(varPart) %in% genes_movement,
                             'Cilium assembly' = row.names(varPart) %in% genes_assembly)))

Cell motility

goterm <- goterm_cellmoti
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Cilium movement

goterm <- goterm_movement
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Cilium assembly

goterm <- goterm_assembly
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Proteasome-mediated ubiquitin-dependent protein catabolic process

goterm <- 'GO:0043161'
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

It is unclear why preteasome-mediated protein degradation should be more or less active in one or the other regime. And the pattern is actually not consistent: two genes from this pathway are quite significantly (but not very much) more expressed in the unpredictable than in the regular environment. But several other genes in the pathway are less significantly, but to a higher degree underexpressed in the unpredictable regime.

Response to oxidative stress

Oxidative stress is one of the threats faced by dormant embryos, and protection against it is one of the classic hallmarks of dormancy (Clark et al. 2012, @Ziv2017).

goterm <- 'GO:0006979'
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

In this case again, the differential expression does not have the same sign in all genes from this category that are regulated by the selective regime. It looks as if eggs in different regimes use different sets of genes to protect themselves against oxidative stress.

Molecular functions

Motor activity

goterm <- 'GO:0003774'
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

The three most significant genes here are overexpressed in eggs from the unpredictable environment, which seems to contrast the results for biological processes of cell motility and cilium movement. However, motor activity is defined as Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate.. Only 1 genes involved in motor activity is also involved in cilium movement.

Peptidase activities: metallocarboxypeptidase activity, metalloendopeptidase activity, serine-type endopeptidase activity and calcium-dependent cysteine-type endopeptidase activity

goterm_metallocarboxy <- 'GO:0004181'
goterm_metalloendopep <- 'GO:0004222'
goterm_serinetypeendo <- 'GO:0004252'
goterm_calciumcystein <- 'GO:0004198'

genes_metallocarboxy <- names(fullAnnotation[grep(goterm_metallocarboxy, fullAnnotation, fixed=TRUE)])
genes_metalloendopep <- names(fullAnnotation[grep(goterm_metalloendopep, fullAnnotation, fixed=TRUE)])
genes_serinetypeendo <- names(fullAnnotation[grep(goterm_serinetypeendo, fullAnnotation, fixed=TRUE)])
genes_calciumcystein <- names(fullAnnotation[grep(goterm_calciumcystein, fullAnnotation, fixed=TRUE)])

vennDiagram(vennCounts(cbind(Metallocarboxy. = row.names(varPart) %in% genes_metallocarboxy,
                             Metalloendo.    = row.names(varPart) %in% genes_metalloendopep,
                             Serine_type = row.names(varPart) %in% genes_serinetypeendo,
                             Cystein_type = row.names(varPart) %in% genes_calciumcystein)))

Metallocarboxypeptidase activity

goterm <- 'GO:0004181'
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Serine-type endopeptidase activity

goterm <- 'GO:0004252'
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Calcium-dependent cysteine-type endopeptidase activity

goterm <- 'GO:0004198'
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Metalloendopeptidase activity

goterm <- 'GO:0004222'
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen

goterm <- 'GO:0016715'
genes  <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
                        c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])

volcanoPlot()

Session info

sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=ca_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
##  [5] LC_MONETARY=ca_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8   
##  [7] LC_PAPER=ca_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] stringr_1.4.0            GO.db_3.10.0             AnnotationDbi_1.48.0    
##  [4] IRanges_2.20.2           S4Vectors_0.24.3         variancePartition_1.16.1
##  [7] Biobase_2.46.0           BiocGenerics_0.32.0      scales_1.1.0            
## [10] foreach_1.4.7            limma_3.42.0             ggplot2_3.2.1           
## 
## loaded via a namespace (and not attached):
##  [1] bit64_0.9-7         splines_3.6.2       gtools_3.8.1       
##  [4] assertthat_0.2.1    blob_1.2.1          yaml_2.2.0         
##  [7] progress_1.2.2      pillar_1.4.3        RSQLite_2.2.0      
## [10] backports_1.1.5     lattice_0.20-38     glue_1.3.1         
## [13] digest_0.6.23       minqa_1.2.4         colorspace_1.4-1   
## [16] htmltools_0.4.0     Matrix_1.2-18       plyr_1.8.5         
## [19] pkgconfig_2.0.3     purrr_0.3.3         gdata_2.18.0       
## [22] lme4_1.1-21         BiocParallel_1.20.1 tibble_2.1.3       
## [25] farver_2.0.3        withr_2.1.2         lazyeval_0.2.2     
## [28] pbkrtest_0.4-7      magrittr_1.5        crayon_1.3.4       
## [31] memoise_1.1.0       evaluate_0.14       doParallel_1.0.15  
## [34] nlme_3.1-143        MASS_7.3-51.5       gplots_3.0.1.2     
## [37] tools_3.6.2         prettyunits_1.1.1   hms_0.5.3          
## [40] lifecycle_0.1.0     munsell_0.5.0       colorRamps_2.3     
## [43] compiler_3.6.2      caTools_1.18.0      rlang_0.4.2        
## [46] grid_3.6.2          nloptr_1.2.1        iterators_1.0.12   
## [49] labeling_0.3        bitops_1.0-6        rmarkdown_2.1      
## [52] boot_1.3-24         gtable_0.3.0        codetools_0.2-16   
## [55] DBI_1.1.0           reshape2_1.4.3      R6_2.4.1           
## [58] knitr_1.27          dplyr_0.8.3         bit_1.1-15.1       
## [61] zeallot_0.1.0       KernSmooth_2.23-16  stringi_1.4.5      
## [64] Rcpp_1.0.3          vctrs_0.2.1         tidyselect_0.2.5   
## [67] xfun_0.12

References

Clark, Melody S., Nadav Y. Denekamp, Michael A. S. Thorne, Richard Reinhardt, Mario Drungowski, Marcus W. Albrecht, Sven Klages, Alfred Beck, Michael Kube, and Esther Lubzens. 2012. “Long-Term Survival of Hydrated Resting Eggs from Brachionus plicatilis.” PLoS ONE 7 (1): e29365. https://doi.org/10.1371/journal.pone.0029365.

Garcia-Roger, Eduardo M., Esther Lubzens, Diego Fontaneto, and Manuel Serra. 2019. “Facing Adversity: Dormant Embryos in Rotifers.” The Biological Bulletin 237 (2): 119–44. https://doi.org/10.1086/705701.

Rozema, Evelien, Sylwia Kierszniowska, Oshri Almog-Gabai, Erica G. Wilson, Young Hae Choi, Robert Verpoorte, Reini Hamo, Vered Chalifa-Caspi, Yehuda G. Assaraf, and Esther Lubzens. 2019. “Metabolomics Reveals Novel Insight on Dormancy of Aquatic Invertebrate Encysted Embryos.” Scientific Reports 9: 8878. https://doi.org/https://doi.org/10.1038/s41598-019-45061-x.

Ziv, T., V. Chalifa-Caspi, N. Denekamp, I. Plaschkes, S. Kierszniowska, I. Blais, A. Admon, and E. Lubzens. 2017. “Dormancy in Embryos: Insight from Hydrated Encysted Embryos of an Aquatic Invertebrate.” Molecular and Cellular Proteomics 16 (10): 1746–69. https://doi.org/10.1074/mcp.RA117.000109.

n.d.